Technical Report: CSVM format for scientific tabular data

نویسندگان

Gérôme Beyries

Frédéric Rodriguez

چکیده

The CSVM (CSV with metadata) is issued from CSV format and used for storing experimental data, models, specifications. CSVM allows the storage of tabular data with a limited but extensible amount of metadata. This increases the exchange and long term use of RAW data because all information needed to use subsequently the data are included in the CSVM file. Basic CSVM files are readable by current tools (i.e. spreadsheets) for handling tables. Using full possibilities of concept, it is possible to deviate from a strict table and annotate also inside the data block. CSVM file are ASCII files and could provide a template for implementing best practices in handling RAW data, in exchange and normalization, in long term resources, or in collaborative processes. In this document we describe the first (CSVM-1) release of CSVM format.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Technical report: CSVM dictionaries

CSVM (CSV with Metadata) is a simple file format for tabular data. The possible application domain is the same as typical spreadsheets files, but CSVM is well suited for long term storage and the inter-conversion of RAW data. CSVM embeds different levels for data, metadata and annotations in human readable format and flat ASCII files. As a proof of concept, Perl and Python toolkits were designe...

متن کامل

Technical Report: CSVM Ecosystem

The CSVM format is derived from CSV format and allows the storage of tabular like data with a limited but extensible amount of metadata. This approach could help computer scientists because all information needed to uses subsequently the data is included in the CSVM file and is particularly well suited for handling RAW data in a lot of scientific fields and to be used as a canonical format. The...

متن کامل

On the communication of scientific data: The Full-Metadata Format

In this paper, we introduce a scientific format for text-based data files, which facilitates storing and communicating tabular data sets. The so-called Full-Metadata Format builds on the widely used INI-standard and is based on four principles: readable self-documentation, flexible structure, fail-safe compatibility, and searchability. As a consequence, all metadata required to interpret the ta...

متن کامل

Clustered Support Vector Machines

In many problems of machine learning, the data are distributed nonlinearly. One way to address this kind of data is training a nonlinear classifier such as kernel support vector machine (kernel SVM). However, the computational burden of kernel SVM limits its application to large scale datasets. In this paper, we propose a Clustered Support Vector Machine (CSVM), which tackles the data in a divi...

متن کامل

Epi Archive: automated data collection of notifiable disease data

Introduction Most countries do not report national notifiable disease data in a machine-readable format. Data are often in the form of a file that contains text, tables and graphs summarizing weekly or monthly disease counts. This presents a problem when information is needed for more data intensive approaches to epidemiology, biosurveillance and public health as exemplified by the Biosurveilla...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1207.5711 شماره

صفحات -

تاریخ انتشار 2012

Technical Report: CSVM format for scientific tabular data

نویسندگان

چکیده

منابع مشابه

Technical report: CSVM dictionaries

Technical Report: CSVM Ecosystem

On the communication of scientific data: The Full-Metadata Format

Clustered Support Vector Machines

Epi Archive: automated data collection of notifiable disease data

عنوان ژورنال:

اشتراک گذاری